Improving EM Algorithm Estimates for Record Linkage Parameters

نویسنده

  • William E. Yancey
چکیده

The EM algorithm can be used to estimate conditional probabilities for matching field patterns for the Fellegi-Sunter model for record linkage. The algorithm is based on a latent class model for the record pairs where one of the classes is the set of true matches. If the number of true match pairs in the data set is too small, then the EM algorithm cannot detect the correct latent class. We consider methods for enriching the density of matches in the set of examined record pairs in order to obtain improved EM algorithm estimates for the record linkage conditional probability parameters.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis of a Probabilistic Record Linkage Technique without Human Review

We previously developed a deterministic record linkage algorithm demonstrating sensitivities approaching 90% while maintaining 100% specificity. Substantially better performance has been reported using probabilistic linkage techniques; however, such methods often incorporate human review into the process. To avoid human review, we employed an estimator function using the Expectation Maximizatio...

متن کامل

Using the EM Algorithm for Weight Computation in the Felligi-Sunter Model of Record Linkage

Let A×B be the product space of two sets A and B which is divided into a (pairs representing the same entity) and nonmatches (pairs representing different entities). Linkage rules are those that divide A×B into links (designated matches), possible links (pairs for which we delay a decision), and nonlinks (designated nonmatches). Under fixed bounds on the error rates, Fellegi and Sunter (1969) p...

متن کامل

Comparison of Estimates Using Record Statistics from Lomax Model: Bayesian and Non Bayesian Approaches

This paper address the problem of Bayesian estimation of the parameters, reliability and hazard function in the context of record statistics values from the two-parameter Lomax distribution. The ML and the Bayes estimates based on records are derived for the two unknown parameters and the survival time parameters, reliability and hazard functions. The Bayes estimates are obtained based on conju...

متن کامل

Methods for Record Linkage and Bayesian Networks

Although terminology differs, there is considerable overlap between record linkage methods based on the Fellegi-Sunter model (JASA 1969) and Bayesian networks used in machine learning (Mitchell 1997). Both are based on formal probabilistic models that can be shown to be equivalent in many situations (Winkler 2000). When no missing data are present in identifying fields and training data are ava...

متن کامل

Supplemental Material: A Hidden Markov Approach for Ascertaining SNP Genotypes from Next Generation Sequencing Data in Presence of Allelic Imbalance by Exploiting Linkage Disequilibrium

In this section, we provide details of the EM algorithm for obtaining the maximum likelihood estimates (MLE) of θ where θ = (α1, β1, α2, β2, e,A) T , where A = (akk′)k,k′=1,··· ,M are parameters in the transition matrix. To this end, we introduce the following complete data corresponding the observed data X, Y = {Gil, δil,Xil : l = 1, · · · , L} for i = 1, · · · , n. The likelihood function for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002